{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "l2HOo2IKWnQU"
      },
      "source": [
        "\n",
        "# Longer ≠ Better: Why RAG Still Matters\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3I1-6I6vW_0N"
      },
      "source": [
        "Retrieval-Augmented Generation (RAG) emerged as a solution to early large language models' context window limitations, allowing selective information retrieval when token constraints prevented processing entire datasets. Now, as models like Gemini 1.5 have the ability to handle [millions of tokens](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/), this breakthrough enables us to compare whether RAG is still a necessary tool to provide context in the era of LLMs with millions tokens context.\n",
        "\n",
        "**Background**\n",
        "\n",
        "\n",
        "*   RAG was developed as a workaround for token constraints in LLMs\n",
        "*   RAG allowed selective information retrieval to avoid context window limitations\n",
        "*   New models like Gemini 1.5 can handle millions of tokens\n",
        "*   As token limits increase, the need for selective retrieval diminishes\n",
        "*   Future applications may process massive datasets without external databases\n",
        "*   RAG may become obsolete as models handle more information directly\n",
        "\n",
        "\n",
        "Let's test it how good models with large token context are compared to RAG\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SC4W1ixwxBiv"
      },
      "source": [
        "## Architecture"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GNPyUtjKxKwy"
      },
      "source": [
        "* **RAG**: We're using Elasticsearch with Semantic text search enabled, and results provided are supplied to LLM as context, in this case Gemini.\n",
        "\n",
        "* **LLM**: We're providing context to the LLM, in this case Gemini, with a maximum of 1M token context."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2LPQNuVbxy7Z"
      },
      "source": [
        "## Methodology"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tAyhEu8Tx2Rr"
      },
      "source": [
        "To compare performance between RAG and LLM full context, we're going to work a mix of technical articles and documentation. To provide full context to the LLM articles and documentation will be provided as context.\n",
        "\n",
        "To identify if answer is the correct or not we're going to ask to both systems *** What is the title of the article?*** . For this we're going to run 2 sets of tests:\n",
        "\n",
        "1. Run a **textual** query in order to find an extract of document and identify where it belongs. Compare RAG and LLM performance\n",
        "2. Run a **semantic** query in order to find a a semantic equivalent sentence from a document. Compare Rag and LLM performance\n",
        "\n",
        "To compare both technologies we're going to measure:\n",
        "- Accuracy\n",
        "- Time\n",
        "- Cost"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b-hGvtbVuZL_"
      },
      "source": [
        "# Setup"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lqHKpw4DZ_lH"
      },
      "source": [
        "We setup the python libraries we're going to use\n",
        "*   **Elasticsearch** - To run queries to Elasticsearch\n",
        "*   **Langchain**     - Interface to LLM\n",
        "\n",
        "\n",
        "Also call API Keys to start working with both components"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "collapsed": true,
        "id": "bKfjs9tadCg_",
        "outputId": "7288687e-615f-4116-a4a4-344ee60f1898"
      },
      "outputs": [],
      "source": [
        "%pip install elasticsearch langchain langchain-core langchain-groq langchain-community matplotlib langchain-google-genai pandas-q"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jF-sKsNqC7eq"
      },
      "source": [
        "### Import libraries, Elasticsearch, defining LLM and Open AI API Keys"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 71,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "7hDx-LTWa_Vb",
        "outputId": "e8bf7847-6553-405d-c835-4345a88444ee"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import json\n",
        "import time\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "from datetime import datetime\n",
        "from getpass import getpass\n",
        "\n",
        "from elasticsearch import Elasticsearch, helpers\n",
        "from langchain.callbacks import get_openai_callback\n",
        "from langchain_core.prompts import ChatPromptTemplate\n",
        "from langchain_core.output_parsers import StrOutputParser\n",
        "from langchain_google_genai import ChatGoogleGenerativeAI\n",
        "\n",
        "\n",
        "os.environ[\"GOOGLE_API_KEY\"] = getpass(\"Enter your Google AI API key: \")\n",
        "os.environ[\"ES_API_KEY\"] = getpass(\"Elasticsearch API Key: \")\n",
        "os.environ[\"ES_ENDPOINT\"] = getpass(\"Elasticsearch ENDPOINT: \")\n",
        "\n",
        "\n",
        "index_name = \"technical-articles\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Elasticsearch client"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 46,
      "metadata": {},
      "outputs": [],
      "source": [
        "es_client = Elasticsearch(\n",
        "    os.environ[\"ES_ENDPOINT\"],\n",
        "    api_key=os.environ[\"ES_API_KEY\"],\n",
        "    request_timeout=120,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Defining LLM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 72,
      "metadata": {},
      "outputs": [],
      "source": [
        "llm = ChatGoogleGenerativeAI(\n",
        "    model=\"gemini-2.0-flash\",\n",
        "    temperature=0,\n",
        "    max_tokens=None,\n",
        "    timeout=None,\n",
        "    max_retries=2,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Function to calculate cost of LLM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 49,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Function to calculate cost of LLM with input and output cost per million tokens\n",
        "def calculate_cost(\n",
        "    input_price=0.10, output_price=0.40, input_tokens=0, output_tokens=0\n",
        "):\n",
        "    input_total_cost = (input_tokens / 1_000_000) * input_price\n",
        "    output_total_cost = (output_tokens / 1_000_000) * output_price\n",
        "\n",
        "    return input_total_cost + output_total_cost"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X8b-LGV3opm1"
      },
      "source": [
        "# 1. Index working files"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Eo2mBCjuw5d9"
      },
      "source": [
        "For this test, we're going to index a mix of 303 documents with technical articles and documentation. These documents will be the source of information for both tests."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UVNJYmYm5Zx1"
      },
      "source": [
        "## Create and populate index"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "poauJkegegza"
      },
      "source": [
        " To implement RAG we're including in mappings a semantic_text field so we can run semantic queries in Elasticsearch, along with the regular text field.\n",
        "\n",
        " Also we're pushing documents to \"technical-articles\" index."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Creating index\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "collapsed": true,
        "id": "wh3sIujjUOxM",
        "outputId": "5a98d167-f7e5-409a-86db-d218424b6796"
      },
      "outputs": [],
      "source": [
        "if not es_client.indices.exists(index=index_name):\n",
        "    # Define a simple mapping for text documents\n",
        "    mappings = {\n",
        "        \"mappings\": {\n",
        "            \"properties\": {\n",
        "                \"text\": {\"type\": \"text\", \"copy_to\": \"semantic_text\"},\n",
        "                \"meta_description\": {\"type\": \"keyword\", \"copy_to\": \"semantic_text\"},\n",
        "                \"title\": {\"type\": \"keyword\", \"copy_to\": \"semantic_text\"},\n",
        "                \"imported_at\": {\"type\": \"date\"},\n",
        "                \"url\": {\"type\": \"keyword\"},\n",
        "                \"semantic_text\": {\n",
        "                    \"type\": \"semantic_text\",\n",
        "                },\n",
        "            }\n",
        "        }\n",
        "    }\n",
        "\n",
        "    es_client.indices.create(index=index_name, body=mappings)\n",
        "\n",
        "    print(f\"Created index '{index_name}'\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Populating index"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Indexing documents using the Bulk API to Elasticsearch"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "file_path = \"dataset.json\"\n",
        "\n",
        "actions = []\n",
        "\n",
        "with open(file_path, \"r\") as f:\n",
        "    documents = json.load(f)\n",
        "    for doc in documents:\n",
        "        document = {\n",
        "            \"_index\": index_name,\n",
        "            \"_source\": {\n",
        "                \"text\": doc[\"text\"],\n",
        "                \"url\": doc[\"url\"],\n",
        "                \"title\": doc[\"title\"],\n",
        "                \"meta_description\": doc[\"meta_description\"],\n",
        "                \"imported_at\": datetime.now(),\n",
        "            },\n",
        "        }\n",
        "\n",
        "        actions.append(document)\n",
        "\n",
        "\n",
        "res = helpers.bulk(es_client, actions)\n",
        "\n",
        "print(\"documents indexed\", res)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 2. Run Comparisons\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_Vt6RM-oHX4b"
      },
      "source": [
        "## Test 1: Textual Query"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Query to retrieve semantic search results from Elasticsearch"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 75,
      "metadata": {},
      "outputs": [],
      "source": [
        "query_str = \"\"\"\n",
        "Let’s now create a test.js file and install our mock client: Now, add a mock for semantic search: We can now create a test for our code, making sure that the Elasticsearch part will always return the same results: Let’s run the tests.\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cZLCBBJVXd5v"
      },
      "source": [
        "We extract a paragraph of **Elasticsearch in JavaScript the proper way, part II** article, we will use it as input to retrieve the results from Elasticsearch.\n",
        "\n",
        "Results will be stored in the results variable."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n",
        "### RAG strategy (Textual)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Executing Match Phrase Search\n",
        "\n",
        "This is the query we're going to use to retrieve the results from Elasticsearch using match phrase search capabilities. We will pass the query_str as input to the match phrase search."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "id": "A6QyaYnfcB7C"
      },
      "outputs": [],
      "source": [
        "textual_rag_summary = {}  # Variable to store results\n",
        "\n",
        "start_time = time.time()\n",
        "\n",
        "es_query = {\n",
        "    \"query\": {\"match_phrase\": {\"text\": {\"query\": query_str}}},\n",
        "    \"_source\": [\"title\"],\n",
        "    \"highlight\": {\n",
        "        \"pre_tags\": [\"\"],\n",
        "        \"post_tags\": [\"\"],\n",
        "        \"fields\": {\"title\": {}, \"text\": {}},\n",
        "    },\n",
        "    \"size\": 10,\n",
        "}\n",
        "\n",
        "response = es_client.search(index=index_name, body=es_query)\n",
        "hits = response[\"hits\"][\"hits\"]\n",
        "\n",
        "textual_rag_summary[\"time\"] = (\n",
        "    time.time() - start_time\n",
        ")  # save time taken to run the query\n",
        "textual_rag_summary[\"es_results\"] = hits  # save hits\n",
        "\n",
        "print(\"ELASTICSEARCH RESULTS: \\n\", json.dumps(hits, indent=4))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This template gives the LLM the instructions to answer the question and the context to do so. At the end of the prompt we're asking for the title of the article.\n",
        "\n",
        "The prompt template will be the same for all test. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 78,
      "metadata": {},
      "outputs": [],
      "source": [
        "# LLM prompt template\n",
        "template = \"\"\"\n",
        "  Instructions:\n",
        "\n",
        "  - You are an assistant for question-answering tasks.\n",
        "  - Answer questions truthfully and factually using only the context presented.\n",
        "  - If you don't know the answer, just say that you don't know, don't make up an answer.\n",
        "  - Use markdown format for code examples.\n",
        "  - You are correct, factual, precise, and reliable.\n",
        "  - Answer\n",
        "\n",
        "  Context:\n",
        "  {context}\n",
        "\n",
        "  Question:\n",
        "  {question}.\n",
        "\n",
        "  What is the title article?\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p1NXsZqCeP29"
      },
      "source": [
        "#### Run results through LLM"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yJB7t532RUsx"
      },
      "source": [
        "Results from Elasticsearch will be provided as context to the LLM for us to get the result we need."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3tfmCLxNtFmB"
      },
      "outputs": [],
      "source": [
        "start_time = time.time()\n",
        "\n",
        "prompt = ChatPromptTemplate.from_template(template)\n",
        "\n",
        "context = \"\"\n",
        "\n",
        "for hit in hits:\n",
        "    # For semantic_text matches, we need to extract the text from the highlighted field\n",
        "    if \"highlight\" in hit:\n",
        "        highlighted_texts = []\n",
        "\n",
        "        for values in hit[\"highlight\"].values():\n",
        "            highlighted_texts.extend(values)\n",
        "\n",
        "        context += f\"{hit['_source']['title']}\\n\"\n",
        "        context += \"\\n --- \\n\".join(highlighted_texts)\n",
        "\n",
        "# Use LangChain for the LLM part\n",
        "chain = prompt | llm | StrOutputParser()\n",
        "\n",
        "printable_prompt = prompt.format(context=context, question=query_str)\n",
        "print(\"PROMPT WITH CONTEXT AND QUESTION:\\n \", printable_prompt)  # Print prompt\n",
        "\n",
        "with get_openai_callback() as cb:\n",
        "    response = chain.invoke({\"context\": context, \"question\": query_str})\n",
        "\n",
        "# Save results\n",
        "textual_rag_summary[\"answer\"] = response\n",
        "textual_rag_summary[\"total_time\"] = (time.time() - start_time) + textual_rag_summary[\n",
        "    \"time\"\n",
        "]  # Sum of time taken to run the semantic search and the LLM\n",
        "textual_rag_summary[\"tokens_sent\"] = cb.prompt_tokens\n",
        "textual_rag_summary[\"cost\"] = calculate_cost(\n",
        "    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens\n",
        ")\n",
        "\n",
        "print(\"LLM Response:\\n \", response)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "r4iHxJcqbtMK"
      },
      "source": [
        "### LLM strategy (Textual)\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Match all query\n",
        "\n",
        "To provide context to the LLM, we're going to get it from the indexed documents in Elasticsearch. Since maximum number of tokens are 1 million, this is 303 documents."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hD727_xVTUdi"
      },
      "outputs": [],
      "source": [
        "textual_llm_summary = {}  # Variable to store results\n",
        "\n",
        "start_time = time.time()\n",
        "\n",
        "es_query = {\"query\": {\"match_all\": {}}, \"sort\": [{\"title\": \"asc\"}], \"size\": 303}\n",
        "\n",
        "es_results = es_client.search(index=index_name, body=es_query)\n",
        "hits = es_results[\"hits\"][\"hits\"]\n",
        "\n",
        "# Save results\n",
        "textual_llm_summary[\"es_results\"] = hits\n",
        "textual_llm_summary[\"time\"] = time.time() - start_time\n",
        "\n",
        "print(\"ELASTICSEARCH RESULTS: \\n\", json.dumps(hits, indent=4))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Run results through LLM\n",
        "\n",
        "As in the previous step, we're going to provide the context to the LLM and ask for the answer."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yu-CT2I4b8xl",
        "outputId": "9e7731f0-7ff0-4124-f6d5-5cb58eb55d4a"
      },
      "outputs": [],
      "source": [
        "start_time = time.time()\n",
        "\n",
        "prompt = ChatPromptTemplate.from_template(template)\n",
        "# Use LangChain for the LLM part\n",
        "chain = prompt | llm | StrOutputParser()\n",
        "\n",
        "printable_prompt = prompt.format(context=context, question=query_str)\n",
        "print(\"PROMPT:\\n \", printable_prompt)  # Print prompt\n",
        "\n",
        "with get_openai_callback() as cb:\n",
        "    response = chain.invoke({\"context\": hits, \"question\": query_str})\n",
        "\n",
        "# Save results\n",
        "textual_llm_summary[\"answer\"] = response\n",
        "textual_llm_summary[\"total_time\"] = (time.time() - start_time) + textual_llm_summary[\n",
        "    \"time\"\n",
        "]  # Sum of time taken to run the match_all query and the LLM\n",
        "textual_llm_summary[\"tokens_sent\"] = cb.prompt_tokens\n",
        "textual_llm_summary[\"cost\"] = calculate_cost(\n",
        "    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens\n",
        ")\n",
        "\n",
        "print(\"LLM Response:\\n \", response)  # Print LLM response"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Test 2: Semantic Query"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### RAG strategy (Non-textual)\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 84,
      "metadata": {},
      "outputs": [],
      "source": [
        "query_str = \"This article explains how to improve code reliability. It includes techniques for error handling, and running applications without managing servers.\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To the second test we're going to use a semantic query to retrieve the results from Elasticsearch. For that we built a short synopsis of **Elasticsearch in JavaScript the proper way, part II** article as query_str and provided it as input to RAG."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Executing semantic search\n",
        "\n",
        "This is the query we're going to use to retrieve the results from Elasticsearch using semantic search capabilities. We will pass the query_str as input to the semantic search."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "semantic_rag_summary = {}  # Variable to store results\n",
        "\n",
        "start_time = time.time()\n",
        "\n",
        "es_query = {\n",
        "    \"retriever\": {\n",
        "        \"rrf\": {\n",
        "            \"retrievers\": [\n",
        "                {\n",
        "                    \"standard\": {\n",
        "                        \"query\": {\n",
        "                            \"bool\": {\n",
        "                                \"should\": [\n",
        "                                    {\n",
        "                                        \"multi_match\": {\n",
        "                                            \"query\": query_str,\n",
        "                                            \"fields\": [\"text\", \"title\"],\n",
        "                                        }\n",
        "                                    },\n",
        "                                    {\"match_phrase\": {\"text\": {\"query\": query_str}}},\n",
        "                                ]\n",
        "                            }\n",
        "                        }\n",
        "                    }\n",
        "                },\n",
        "                {\n",
        "                    \"standard\": {\n",
        "                        \"query\": {\n",
        "                            \"semantic\": {\n",
        "                                \"field\": \"semantic_text\",\n",
        "                                \"query\": query_str,\n",
        "                            }\n",
        "                        }\n",
        "                    }\n",
        "                },\n",
        "            ],\n",
        "            \"rank_window_size\": 50,\n",
        "            \"rank_constant\": 20,\n",
        "        }\n",
        "    },\n",
        "    \"_source\": [\"title\"],\n",
        "    \"highlight\": {\n",
        "        \"pre_tags\": [\"\"],\n",
        "        \"post_tags\": [\"\"],\n",
        "        \"fields\": {\"title\": {}, \"text\": {}},\n",
        "    },\n",
        "    \"size\": 10,\n",
        "}\n",
        "\n",
        "\n",
        "response = es_client.search(index=index_name, body=es_query)\n",
        "hits = response[\"hits\"][\"hits\"]\n",
        "\n",
        "semantic_rag_summary[\"time\"] = (\n",
        "    time.time() - start_time\n",
        ")  # save time taken to run the query\n",
        "semantic_rag_summary[\"es_results\"] = hits  # save hits\n",
        "\n",
        "print(\"ELASTICSEARCH RESULTS: \\n\", json.dumps(hits, indent=4))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Run results through LLM\n",
        "Now results from Elasticsearch will be provided as context to the LLM for us to get the result we need."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "start_time = time.time()\n",
        "\n",
        "prompt = ChatPromptTemplate.from_template(template)\n",
        "\n",
        "context = \"\"\n",
        "\n",
        "for hit in hits:\n",
        "    # For semantic_text matches, we need to extract the text from the highlighted field\n",
        "    if \"highlight\" in hit:\n",
        "        highlighted_texts = []\n",
        "\n",
        "        for values in hit[\"highlight\"].values():\n",
        "            highlighted_texts.extend(values)\n",
        "\n",
        "        context += f\"{hit['_source']['title']}\\n\"\n",
        "        context += \"\\n --- \\n\".join(highlighted_texts)\n",
        "\n",
        "# Use LangChain for the LLM part\n",
        "chain = prompt | llm | StrOutputParser()\n",
        "\n",
        "printable_prompt = prompt.format(context=context, question=query_str)\n",
        "print(\"PROMPT:\\n \", printable_prompt)  # Print prompt\n",
        "\n",
        "with get_openai_callback() as cb:\n",
        "    response = chain.invoke({\"context\": context, \"question\": query_str})\n",
        "\n",
        "# Save results\n",
        "semantic_rag_summary[\"answer\"] = response\n",
        "semantic_rag_summary[\"total_time\"] = (time.time() - start_time) + semantic_rag_summary[\n",
        "    \"time\"\n",
        "]  # Sum of time taken to run the semantic search and the LLM\n",
        "semantic_rag_summary[\"tokens_sent\"] = cb.prompt_tokens\n",
        "semantic_rag_summary[\"cost\"] = calculate_cost(\n",
        "    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens\n",
        ")\n",
        "\n",
        "print(\"LLM Response:\\n \", response)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### LLM strategy (Non-textual)\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Match all query\n",
        "\n",
        "To provide context to the LLM, we're going to get it from the indexed documents in Elasticsearch. Since maximum number of tokens are 1 million, this is 303 documents."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "semantic_llm_summary = {}  # Variable to store results\n",
        "\n",
        "start_time = time.time()\n",
        "\n",
        "es_query = {\"query\": {\"match_all\": {}}, \"sort\": [{\"title\": \"asc\"}], \"size\": 303}\n",
        "es_llm_context = es_client.search(index=index_name, body=es_query)\n",
        "\n",
        "hits = es_llm_context[\"hits\"][\"hits\"]\n",
        "\n",
        "print(\"ELASTICSEARCH RESULTS: \\n\", json.dumps(hits, indent=4))\n",
        "\n",
        "# Save results\n",
        "semantic_llm_summary[\"es_results\"] = hits\n",
        "semantic_llm_summary[\"time\"] = time.time() - start_time"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Run results through LLM\n",
        "\n",
        "As in the previous step, we're going to provide the context to the LLM and ask for the answer."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "start_time = time.time()\n",
        "\n",
        "prompt = ChatPromptTemplate.from_template(template)\n",
        "# Use LangChain for the LLM part\n",
        "chain = prompt | llm | StrOutputParser()\n",
        "\n",
        "printable_prompt = prompt.format(context=context, question=query_str)\n",
        "print(\"PROMPT:\\n \", printable_prompt)  # Print prompt\n",
        "\n",
        "with get_openai_callback() as cb:\n",
        "    response = chain.invoke({\"context\": hits, \"question\": query_str})\n",
        "\n",
        "print(response)\n",
        "\n",
        "# Save results\n",
        "semantic_llm_summary[\"answer\"] = response\n",
        "semantic_llm_summary[\"total_time\"] = (time.time() - start_time) + semantic_llm_summary[\n",
        "    \"time\"\n",
        "]  # Sum of time taken to run the match_all query and the LLM\n",
        "semantic_llm_summary[\"tokens_sent\"] = cb.prompt_tokens\n",
        "semantic_llm_summary[\"cost\"] = calculate_cost(\n",
        "    input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens\n",
        ")\n",
        "\n",
        "\n",
        "print(\"LLM Response:\\n \", response)  # Print LLM response"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2WktohArnaNV"
      },
      "source": [
        "## 3. Printing results"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Printing results\n",
        "\n",
        "Now we're going to print the results of both tests in a dataframe."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 99
        },
        "id": "g8gNaX4bnhFA",
        "outputId": "7f984943-b27c-451f-a568-aae142222cb7"
      },
      "outputs": [],
      "source": [
        "df1 = pd.DataFrame(\n",
        "    [\n",
        "        {\n",
        "            \"Strategy\": \"Textual RAG\",\n",
        "            \"Answer\": textual_rag_summary[\"answer\"],\n",
        "            \"Tokens Sent\": textual_rag_summary[\"tokens_sent\"],\n",
        "            \"Time\": textual_rag_summary[\"total_time\"],\n",
        "            \"LLM Cost\": textual_rag_summary[\"cost\"],\n",
        "        },\n",
        "        {\n",
        "            \"Strategy\": \"Textual LLM\",\n",
        "            \"Answer\": textual_llm_summary[\"answer\"],\n",
        "            \"Tokens Sent\": textual_llm_summary[\"tokens_sent\"],\n",
        "            \"Time\": textual_llm_summary[\"total_time\"],\n",
        "            \"LLM Cost\": textual_llm_summary[\"cost\"],\n",
        "        },\n",
        "    ]\n",
        ")\n",
        "\n",
        "\n",
        "df2 = pd.DataFrame(\n",
        "    [\n",
        "        {\n",
        "            \"Strategy\": \"Semantic RAG\",\n",
        "            \"Answer\": semantic_rag_summary[\"answer\"],\n",
        "            \"Tokens Sent\": semantic_rag_summary[\"tokens_sent\"],\n",
        "            \"Time\": semantic_rag_summary[\"total_time\"],\n",
        "            \"LLM Cost\": semantic_rag_summary[\"cost\"],\n",
        "        },\n",
        "        {\n",
        "            \"Strategy\": \"Semantic LLM\",\n",
        "            \"Answer\": semantic_llm_summary[\"answer\"],\n",
        "            \"Tokens Sent\": semantic_llm_summary[\"tokens_sent\"],\n",
        "            \"Time\": semantic_llm_summary[\"total_time\"],\n",
        "            \"LLM Cost\": semantic_llm_summary[\"cost\"],\n",
        "        },\n",
        "    ]\n",
        ")\n",
        "\n",
        "print(\"Textual Query DF\")\n",
        "display(df1)\n",
        "\n",
        "print(\"Semantic Query DF\")\n",
        "display(df2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Printing charts\n",
        "\n",
        "And for better visualization of the results, we're going to print a bar chart with the number of tokens sent and the response time by strategy."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "df_combined = pd.concat([df1, df2])\n",
        "\n",
        "plt.figure(figsize=(10, 5))\n",
        "df_combined.plot(kind=\"bar\", x=\"Strategy\", y=\"Tokens Sent\", legend=False, ax=plt.gca())\n",
        "plt.title(\"Tokens Sent by Strategy\")\n",
        "plt.yscale(\"log\")\n",
        "\n",
        "plt.figure(figsize=(10, 5))\n",
        "df_combined.plot(kind=\"bar\", x=\"Strategy\", y=\"Time\", legend=False, ax=plt.gca())\n",
        "plt.title(\"Response Time by Strategy\")\n",
        "plt.ylabel(\"Time (seconds)\")\n",
        "\n",
        "\n",
        "plt.figure(figsize=(10, 5))\n",
        "df_combined.plot(kind=\"bar\", x=\"Strategy\", y=\"LLM Cost\", legend=False, ax=plt.gca())\n",
        "plt.title(\"Cost by Strategy\")\n",
        "plt.yscale(\"log\")\n",
        "plt.ylabel(\"Cost\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Clean resources\n",
        "\n",
        "As an optional step, we're going to delete the index from Elasticsearch."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "es_client.indices.delete(index=index_name, ignore=[400, 404])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2MQBgFjmq5Yc"
      },
      "source": [
        "## Comments on Textual query\n",
        "\n",
        "\n",
        "### On RAG\n",
        "1.   RAG was able to find the correct result\n",
        "2.   The time to run a full context was similar to LLM with partial context\n",
        "\n",
        "\n",
        "### On LLM\n",
        "1. LLM was unable to find the correct result\n",
        "2. Time to provide a result was much longer than RAG\n",
        "3. Pricing is much higher than RAG\\\n",
        "4. If we are using a self managed LLM, the level hardware must be more powerful than with a RAG approach."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
